Valid Statistical Inference on Automatically Matched Files

نویسندگان

  • Rob Hall
  • Stephen E. Fienberg
چکیده

We develop a statistical process for determining a confidence set for an unknown bipartite matching. It requires only modest assumptions on the nature of the distribution of the data. The confidence set involves a set of linear constraints on the bipartite matching, which permits efficient analysis of the matched data, e.g., using linear regression, while maintaining the proper degree of uncertainty about the linkage itself.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Matching in Randomized Trials and Observational Studies.

In many randomized and observational studies the allocation of treatment among a sample of n independent and identically distributed units is a function of the covariates of all sampled units. As a result, the treatment labels among the units are possibly dependent, complicating estimation and posing challenges for statistical inference. For example, cluster randomized trials frequently sample ...

متن کامل

Testing linkage disequilibrium in sibships.

We describe the use of multivariate regression for testing allelic association in the presence of linkage, using marker genotype data from sibships. The test is valid, provided that the correct mean structure is modeled but does not require the correlation structure within families to be specified. The test can be implemented using standard statistical software such as the SAS programming langu...

متن کامل

Data Fusion: Identification Problems, Validity, and Multiple Imputation

Data fusion techniques typically aim to achieve a complete data file from different sources which do not contain the same units. Traditionally, data fusion, in the US also addressed by the term statistical matching, is done on the basis of variables common to all files. It is well known that those approaches establish conditional independence of the (specific) variables not jointly observed giv...

متن کامل

Seaview: Using Fine-Grained Type Inference to Aid Log File Analysis

Log files contain a lot of valuable information for debug and monitoring. However, sense-making using them is a cumbersome task because they are typically stored and interpreted as plain text. We propose a mechanism to restore some semantics to log files, by performing static analysis on a Java program to automatically infer fine-grained dimensional information for the values being logged [5]. ...

متن کامل

Valid Post-Selection Inference

It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid “post-selection inference” by reducing the problem to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012